class: center, middle, inverse, title-slide .title[ # Spatial Data Workshop: Working with Census Data and Introduction to Map Making ] .author[ ### Josemari Feliciano ] .institute[ ### Department of Math & Statistics, American University ] .date[ ### December 2024 Guest lecture Yale BIS 679 - Advanced Statistical Programming in SAS and R ] --- ## Today's agenda - Discuss US-Based Geospatial and Demographic Data. - Things to learn after this course (share resources, advice). - Open discussion at the end where you can ask any question about R, RStudio, internships, and data science. <style type="text/css"> .tiny .remark-code { /*Change made here*/ font-size: 70% !important; } .extra-tiny .remark-code { /*Change made here*/ font-size: 50% !important; } </style> --- ## Discuss US-Based Geospatial and Demographic Data: Goals - Introduce you to various geographic boundaries (e.g., counties, tracts, block groups) we work with here in the US. - Introduction to Census Geocoding tools using (a) their web interface and (b) the tidygeocoder package. - Provide an overview of various datasets offered by the US Census Bureau. - Provide a detailed introduction to the American Community Survey (ACS) data. - Learn how to use R packages (e.g., censusapi, tidycensus) to seamlessly download and work with ACS data. - Learn the basics of static map making using ggplot2. --- ## Before we continue Let us pause for a minute or two before we continue with the workshop. Go to: https://api.census.gov/data/key_signup.html - Sign up for a quick API key from the Census. We will need the API key for censusapi and tidycensus packages. --- ## A brief note Part of the map making lecture will use the ggplot and dplyr packages which are part of the 'core' tidyverse packages that R-based data scientists use for our daily work. If you're not familiar with the ggplot syntax to create plots, a copy of my lecture materials for ggplot plot making can be found [here](https://jmtfeliciano.github.io/DATA412Fall2024/Exercise2FilledOut.html). If you're not familiar with dplyr, a copy of my lecture materials for a two-part dplyr lecture series can be found here: [part 1](https://jmtfeliciano.github.io/DATA412Fall2024/Exercise3FilledIn.html) and [part 2](https://jmtfeliciano.github.io/DATA412Fall2024/Exercise4FilledIn.html). My goal today is not to make you an expert in map making. But I will be giving you templates (e.g., in Slide 43). --- ## Geographic Identifiers (GEOIDs): The Basics. Geographic identifiers (or GEOIDs) are numeric codes that uniquely identify all administrative/legal and statistical geographic areas. - Without a common identifier among geographic and demographic datasets, researchers and other stakeholders would have a difficult time pairing the appropriate demographic data with the appropriate geographic data, thus considerably increasing data processing times and the likelihood of data inaccuracy. Here in the US, we primarily use what are called Federal Information Processing Series (FIPS) codes. - Many US-based datasets would label their geographic and demographic datasets with either GEOID or FIPS to indicate the relevant code. Datasets use GEOID and FIPS interchangeably. - If you are working with spatial data, it is best to have the FIPS code to easily merge the datasets. --- ## Geographic hierarchies <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/geography.png" alt="Figure 1. Geographic hierarchies in the United States. Typically, the key notable geographic levels scientists and policy makers concern themselves with are: (1) State, (2) County, (3) Census Tract, (4) Census Block, and (5) Zip Code Tabulation Areas (ZCTAs)." width="60%" /> <p class="caption">Figure 1. Geographic hierarchies in the United States. Typically, the key notable geographic levels scientists and policy makers concern themselves with are: (1) State, (2) County, (3) Census Tract, (4) Census Block, and (5) Zip Code Tabulation Areas (ZCTAs).</p> </div> --- ## Geographic hierarchies <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/geographyv2.png" alt="Figure 2. Geographic hierarchies in the United States. Visual representation of how counties, census tracts, block groups, and blocks are nested within one another." width="60%" /> <p class="caption">Figure 2. Geographic hierarchies in the United States. Visual representation of how counties, census tracts, block groups, and blocks are nested within one another.</p> </div> --- ## Federal Information Processing Standards (FIPS) <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/fips.png" alt="Figure 3. FIPS standards across geographic hierarchies. Again, GEOID/FIPS codes are typically what we use to identify both the geographic level and specific location we are working with. FIPS and GEOID are often used synonymously with one another." width="60%" /> <p class="caption">Figure 3. FIPS standards across geographic hierarchies. Again, GEOID/FIPS codes are typically what we use to identify both the geographic level and specific location we are working with. FIPS and GEOID are often used synonymously with one another.</p> </div> --- ## States FIPS You can get this from many websites. This specific list is from [the Census Bureau directly](https://www2.census.gov/geo/docs/reference/state.txt).
--- ### A Quick Detour: Non-Census Data at County and Tract Level Federal agencies and researchers are increasingly using the CDC/ATSDR Social Vulnerability Index (SVI). __From CDC:__ "Natural disasters and infectious disease outbreaks can pose a threat to a community’s health. Socially vulnerable populations are especially at risk during public health emergencies because of factors like socioeconomic status, household composition, minority status, or housing type and transportation." __SVI Availability:__ Data are available at county- and tract- level. __Index Range:__ The index (labelled RPL_THEMES in the dataset) is a score between 0 (least vulnerable) and 1 (most vulnerable). For more information about the SVI dataset, please visit the [CDC SVI website](https://www.atsdr.cdc.gov/place-health/php/svi/svi-data-documentation-download.html). --- ### Social Vulnerability Index Scoring Breakdown <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/svi_breakdown.png" alt="Figure 4. Factors that impact the index for the 2022 SVI. The latest SVI data is for 2022. The dataset calculates the SVI using Census ACS data (more on this later) for 2018-2022." width="60%" /> <p class="caption">Figure 4. Factors that impact the index for the 2022 SVI. The latest SVI data is for 2022. The dataset calculates the SVI using Census ACS data (more on this later) for 2018-2022.</p> </div> --- ### Partial SVI Data at County-Level:
--- ### Partial SVI Data at Census Tract-Level:
--- ## Free geocoding tools You likely know which county you are in. However, you likely do not know the specific census tract or block group you are in. Luckily, Census has free geocoding tools (e.g., web interface, API) that can help us! Web interface link is: https://geocoding.geo.census.gov/geocoder/geographies/address?form --- ### Free geocoding tools in R: forward geocoding You can use the tidygeocoder package in R. The code below is an example of forward geocoding (addresses ⮕ coordinates). By default, geocode() uses Nominatim (a geocoding software package) to perform the task. .tiny[ ``` r # run if not installed before: install.packages("tidygeocoder") library(tidygeocoder) addresses_df <- data.frame(address = c("60 College St, New Haven, CT")) geocode(addresses_df, address = address) ``` ``` ## Passing 1 address to the Nominatim single address geocoder ``` ``` ## Query completed in: 1 seconds ``` ``` ## # A tibble: 1 × 3 ## address lat long ## <chr> <dbl> <dbl> ## 1 60 College St, New Haven, CT 41.3 -72.9 ``` ] --- ### Free geocoding tools in R: forward geocoding If you are interested in determining the geographies (e.g., county, tract information), we need to use the census method which leverages the Census API. For full output, see next slide. .tiny[ ``` r # run if not installed before: install.packages("tidygeocoder") library(tidygeocoder) addresses_df <- data.frame(address = c("60 College St, New Haven, CT")) results_df <- geocode(addresses_df, address = address, method = "census", full_results = TRUE, api_options = list(census_return_type = 'geographies')) ``` ] --- ### Some extractable information from results_df .tiny[ ``` r results_df$geographies.Counties ``` ``` ## [[1]] ## GEOID CENTLAT AREAWATER STATE BASENAME OID ## 1 09170 +41.2943775 653525530 09 South Central Connecticut 2759030115366160 ## LSADC FUNCSTAT INTPTLAT NAME OBJECTID ## 1 PL N +41.2901671 South Central Connecticut Planning Region 1429 ## CENTLON COUNTYCC COUNTYNS AREALAND INTPTLON MTFCC COUNTY ## 1 -072.8336986 H5 02830252 951177351 -072.8370243 G4020 170 ``` ``` r results_df$`geographies.Census Tracts` ``` ``` ## [[1]] ## GEOID CENTLAT AREAWATER STATE BASENAME OID LSADC ## 1 09170140300 +41.2995898 0 09 1403 20790338969445 CT ## FUNCSTAT INTPTLAT NAME OBJECTID TRACT CENTLON AREALAND ## 1 S +41.2995898 Census Tract 1403 4777 140300 -072.9326303 719325 ## INTPTLON MTFCC COUNTY ## 1 -072.9326303 G5020 170 ``` ] --- ### What other geographic details are available? Pay close attention to column names with a geographies prefix. .tiny[ ``` r colnames(results_df)[7:17] ``` ``` ## [1] "geographies.States" ## [2] "geographies.Combined Statistical Areas" ## [3] "geographies.County Subdivisions" ## [4] "geographies.Urban Areas" ## [5] "geographies.Incorporated Places" ## [6] "geographies.Counties" ## [7] "geographies.2024 State Legislative Districts - Upper" ## [8] "geographies.2024 State Legislative Districts - Lower" ## [9] "geographies.2020 Census Blocks" ## [10] "geographies.Census Tracts" ## [11] "geographies.119th Congressional Districts" ``` ] --- ### Free geocoding tools in R: reverse geocoding You can use the tidygeocoder package in R. The code below is an example of reverse geocoding (coordinates ⮕ addresses). .tiny[ ``` r reverse_geo(lat = "41.30374", long = "-72.93216") ``` ``` ## Passing 1 coordinate to the Nominatim single coordinate geocoder ``` ``` ## Query completed in: 1 seconds ``` ``` ## # A tibble: 1 × 3 ## lat long address ## <dbl> <dbl> <chr> ## 1 41.3 -72.9 Laboratory of Epidemiology and Public Health, 60, College Street,… ``` ] --- ## List of Census Surveys and Datasets .pull-left[ - The US Census Bureau conducts 130+ surveys each year. - A detailed list can be accessed by clicking [this](https://www.census.gov/programs-surveys/surveys-programs.html). - Let us quickly explore this list using a web browser. ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/census_survey.png" alt="Figure 5. Screenshot of page showing all the surveys performed by the US Census Bureau." width="100%" /> <p class="caption">Figure 5. Screenshot of page showing all the surveys performed by the US Census Bureau.</p> </div> ] --- ### Quick overview of Small Area Health Insurance Estimates (SAHIE) .pull-left[ You may access the large yearly SAHIE datasets by clicking [this](https://www2.census.gov/programs-surveys/sahie/datasets/time-series/estimates-acs/). __Recommended:__ Accessing the data using the SAHIE interactive tool to minimize data cleaning. The tool can be accessed by clicking [this](https://www.census.gov/data-tools/demo/sahie/#/). This greatly minimizes data cleaning/subsetting tasks. ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/sahie.png" alt="Figure 6. Screenshot of the SAHIE dashboard." width="100%" /> <p class="caption">Figure 6. Screenshot of the SAHIE dashboard.</p> </div> ] --- ## The American Community Survey (ACS) Data The American Community Survey (ACS) is an ongoing yearly survey. Arguably the most widely used data set from Census Bureau. __Census Bureau:__ "[ACS] is the premier source for detailed population and housing information about our nation." __Two yearly versions:__ ACS 1-year and ACS 5-year. Note: For older data (2007-2013), 3 year estimates exist. - ACS 1-year estimates data for areas with populations of 65,000+. - ACS 5 year estimates data for all areas regardless of population size. - Many datasets provided by other federal agencies are subsets or are created in part using ACS data. __Language:__ If someone says they're using the 5-year 2020 ACS data, they're referring to the 2016-2020 5-year ACS data. --- ## The American Community Survey (ACS) Data Many ACS-related documentation online. __What to look for and remember:__ Table Shells. Table shells (particularly for detailed tables) is a comprehensive list of variable documentation for ACS data. - Table shells are provided [yearly](https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.html). If you are doing a longitudinal study using ACS data (e.g., a 10 year study), it might be a good idea to check the relevant yearly shell tables to see if variables of interest are available for all years. - Alternative ACS documentation: API documentation also lists all the available ACS variables [here](https://api.census.gov/data/2022/acs/acs5/variables.html). Let us download the ACS 2022 Table Shells and go over it together. Let us search for disability-related variables. --- ## The American Community Survey (ACS) Table Shells .pull-left[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/TableShell.png" alt="Figure 7. Excel Table Shell." width="98%" /> <p class="caption">Figure 7. Excel Table Shell.</p> </div> ] .pull-right[ <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/API.png" alt="Figure 8. API Table Shell." width="98%" /> <p class="caption">Figure 8. API Table Shell.</p> </div> ] UniqueID from the Excel Table Shell (or Name in the API Table Shell) represents the variable(s) we want to get from the Census ACS data. Technically, Name in the API Table Shell is more accurate since it has the suffix "-E" which represents "estimates". Do note that if you're using the Excel Table Shell, you'll eventually need to add the suffix "-E" when requesting data from the Census API. --- ### censusapi package: pulling data into R. Suppose we are interesting in county-level population data. According to the ACS 2022 table shell, the variable we need to use is: B01001_001E. We can use the censusapi package to get census data directly into R. .tiny[ ``` r # run this if not installed: install.packages("censusapi") library(censusapi) Sys.setenv(CENSUS_KEY="") # put your API key here population_data <- getCensus( name = "acs/acs5", # requests ACS5 data vintage = 2022, # requests 2022 data vars = c("B01001_001E"), #requested variable region = "county:*") #requested geography head(population_data) ``` ``` ## state county B01001_001E ## 1 01 001 58761 ## 2 01 003 233420 ## 3 01 005 24877 ## 4 01 007 22251 ## 5 01 009 59077 ## 6 01 011 10328 ``` ] --- ## Examples of Other Important censusapi calls .tiny[ An example of asking for multiple variables: ``` r data1 <- getCensus(name = "acs/acs5", vintage = 2019, vars = c("B28002_001E", "B28002_002E"), region = "county:*") ``` An example of asking for Connecticut-only county-level data (Note: CT's FIPS code is 09). ``` r data2 <- getCensus(name = "acs/acs5", vintage = 2019, vars = c("B28002_001E", "B28002_002E"), region = "county:*", regionin = "state:09") ``` An example of asking for Missouri-only tract-level data (Note: MO's FIPS code is 29). ``` r data3 <- getCensus(name = "acs/acs5", vintage = 2019, vars = c("B28002_001E", "B28002_002E"), region = "tract:*", regionin = "state:29") ``` ] --- ## The typical recipe: Making Maps the Simple Way 1. We need a shapefile file which is a digital format for storing geographic location and associated attribute information. We will use the tigris package to get the data directly from the US Census Bureau. 2. We need data to map we are interested in. 3. Merge data (e.g., SVI data) to the shapefile (which is also a data frame). 4. Leverage ggplot2 package to render the map. For this example: We will map the SVI data from the CDC for Missouri. You may download the relevant file from my Github repo (or you may download the file from the next page). --- ## Quick data look at MO's 2019 Tract-level SVI Data
--- ## Creating SVI Map for MO .tiny[ Step 1: Retrieve shapefile needed. ``` r library(tigris) mo_shape_file <- tracts(state = "MO", year = 2019) ``` Step 2: Load MO SVI data into R. ``` r library(tidyverse) mo_svi_data <- read_csv("https://raw.githubusercontent.com/jmtfeliciano/teachingdata/refs/heads/main/MissouriSVI2019.csv") |> mutate(GEOID = as.character(FIPS)) # rename FIPS into GEOID # In Base R: # mo_svi_data <- read.csv(https://raw.githubusercontent.com/jmtfeliciano/teachingdata/refs/heads/main/MissouriSVI2019.csv) # mo_svi_data$GEOID <- as.character(mo_svi_data$FIPS) ``` Step 3: Merge SVI data into shapefile. ``` r mo_shape_file_v2 <- left_join(mo_shape_file, mo_svi_data) ``` Step 4: Plot map (Note: RPL_THEMES is the SVI variable). ``` r ggplot(data = mo_shape_file_v2) + geom_sf(aes(fill = RPL_THEMES)) ``` ] --- ## Generating Map .pull-left[ .tiny[ ``` r ggplot(data = mo_shape_file_v2) + geom_sf(aes(fill = RPL_THEMES)) ``` ] ] .pull-right[ <img src="data:image/png;base64,#images/svi_mo_map.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Further Customizations .pull-left[ .tiny[ ``` r # Added theme_void() # to remove grid and grey background ggplot(data = mo_shape_file_v2) + geom_sf(aes(fill = RPL_THEMES)) + theme_void() ``` ] ] .pull-right[ <img src="data:image/png;base64,#images/svi_mo_map_v2.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Further Customizations Part 2 .pull-left[ .tiny[ ``` r # Further customizes labels and color gradient ggplot(data = mo_shape_file_v2) + geom_sf(aes(fill = RPL_THEMES)) + theme_void() + scale_fill_gradient(low="#1fa187", high="#440154") + labs(fill='MO-Specific SVI') ``` ] ] .pull-right[ <img src="data:image/png;base64,#images/svi_mo_map_v3.png" width="100%" style="display: block; margin: auto;" /> ] --- ## tigris package shapefiles In the previous example, we used tracts(state = "MO") to get the tract-specific shapefile for MO. Many other shape files are available. Two key examples: For state-level map: states(). For county-level map: counties(). To the best of my knowledge, there are 40-50 shapefiles available (e.g. AIANNH [American Indian, Alaska Native and Native Hawaiian] boundaries, zip code tabulation area (ZCTA) boundaries). Speaking of ZCTA, a brief comment on ZCTA. --- ## Detour: Zip Code Tabulation Area (ZCTA) ACS datasets are also available at the ZCTA-level. This might sound like the zip codes we use in our addresses. But they are not the same. Most of the time, your postal zip code is the same as the ZCTA. Zip codes primarily used by the Postal Service for P.O. boxes will likely belong to a different ZCTA. The same is true for areas with few residential addresses (or areas that are primarily occuppied by commercial businesses). There are available crosswalks out there that can help you convert between postal zip codes and ZCTA (e.g., [crosswalk from censusreporter](https://github.com/censusreporter/acs-aggregate/blob/master/crosswalks/zip_to_zcta/ZIP_ZCTA_README.md)). __Please talk to a geographer or demographer before doing any comprehensive work with zip codes or ZCTA.__ --- ## tidycensus package and census data. tidycensus is an R package that allows users to interface with a select number of the US Census Bureau’s data APIs and return data frames. If you want to map ACS-related data, the tidycensus package is the most convenient way to go. One of the advantages of using tidycensus is it has the option to return not just the requested variable(s) but also the corresponding shapefile needed. If your goal is to visualize Census data via a map, tidycensus is the package to use. Before going further, load the tidycensus package: ``` r # run install.packages("tidycensus") if not installed library(tidycensus) ``` --- ## tidycensus package and census data. The script below uses `load_variables()` to list the available variables within the 2023 ACS5 data--this is a table shell but loaded into R as a data frame. Remember, when someone refers to '2023 ACS 5 data', the estimates actually use data for 2019-2023). ``` r variable_list_2022 <- load_variables(2022, "acs5", cache = TRUE) nrow(variable_list_2022) ``` ``` ## [1] 28152 ``` --- ## tidycensus package and census data Looks familiar? .tiny[ ``` r head(variable_list_2022) ``` ``` ## # A tibble: 6 × 4 ## name label concept geography ## <chr> <chr> <chr> <chr> ## 1 B01001A_001 Estimate!!Total: Sex by Age (Whi… tract ## 2 B01001A_002 Estimate!!Total:!!Male: Sex by Age (Whi… tract ## 3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years Sex by Age (Whi… tract ## 4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years Sex by Age (Whi… tract ## 5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years Sex by Age (Whi… tract ## 6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years Sex by Age (Whi… tract ``` ] --- ## tidycensus package and census data Advanced recipe: using basic text mining skills in R to find tables related to medicare. .tiny[ ``` r variable_list_2022 |> filter(str_detect(concept, regex("medicare", ignore_case = TRUE))) |> relocate(concept) # relocate() moves concept into the first column ``` ``` ## # A tibble: 24 × 4 ## concept name label geography ## <chr> <chr> <chr> <chr> ## 1 Allocation of Medicare Coverage B992706_001 Estimate!!Total: tract ## 2 Allocation of Medicare Coverage B992706_002 Estimate!!Total:!!Allo… tract ## 3 Allocation of Medicare Coverage B992706_003 Estimate!!Total:!!Not … tract ## 4 Medicare Coverage by Sex by Age C27006_001 Estimate!!Total: tract ## 5 Medicare Coverage by Sex by Age C27006_002 Estimate!!Total:!!Male: tract ## 6 Medicare Coverage by Sex by Age C27006_003 Estimate!!Total:!!Male… tract ## 7 Medicare Coverage by Sex by Age C27006_004 Estimate!!Total:!!Male… tract ## 8 Medicare Coverage by Sex by Age C27006_005 Estimate!!Total:!!Male… tract ## 9 Medicare Coverage by Sex by Age C27006_006 Estimate!!Total:!!Male… tract ## 10 Medicare Coverage by Sex by Age C27006_007 Estimate!!Total:!!Male… tract ## # ℹ 14 more rows ``` ] Again: If you're not familiar with dplyr, a copy of my lecture materials for a two-part dplyr lecture series can be found here: [part 1](https://jmtfeliciano.github.io/DATA412Fall2024/Exercise3FilledIn.html) and [part 2](https://jmtfeliciano.github.io/DATA412Fall2024/Exercise4FilledIn.html). --- ## tidycensus package Task: Suppose we want to map the median % of household income spent on rent for each state using variable B25071_001. .pull-left[ __What to run for county-level ACS5 data:__ .tiny[ ``` r library(tidycensus) census_api_key("YOUR CENSUS API KEY HERE") shapefile_with_data <- get_acs( geography = "state", variables = "B25071_001", year = 2019, survey = "acs5", geometry = TRUE, shift_geo = TRUE ) ``` ] ] .pull-right[ The key part here is: Make sure geometry = TRUE as the default is FALSE. By setting geometry as TRUE, you are instructing get_acs() to return the final data as an SF object (shapefile) that is ready for map rendering via ggplot2. shift_geo = TRUE is also important as it will compress the distance between the contiguous United States with Alaska, Hawaii, and Puerto Rico. ] --- ## Rendering the map .pull-left[ .tiny[ ``` r ggplot(data = shapefile_with_data) + geom_sf(aes(fill = estimate), color = NA) + theme_void() + labs(fill='Median Gross Rent as a % of Household Income') + scale_fill_gradient(low="#1fa187", high="#440154") + theme(legend.position="bottom") ``` ] ] .pull-right[ <img src="data:image/png;base64,#images/rent_map.png" width="100%" style="display: block; margin: auto;" /> ] Note: "#1fa187" and "#440154" above are what are called hexadecimal representation of colors. An excellent detailed guide on colors in R can be found by clicking [this resource from UCSB](https://www.nceas.ucsb.edu/sites/default/files/2020-04/colorPaletteCheatsheet.pdf). --- ## Importance of shifting geometry I mentioned earlier that shift_geo = TRUE is important. Here's the map you'd generate without setting that argument as TRUE. <img src="data:image/png;base64,#images/without_shift.png" width="100%" style="display: block; margin: auto;" /> --- ## Rendering the map: Example 2 (Full Template) .tiny[ ``` r library(tidycensus) library(tidyverse) census_api_key("YOUR CENSUS API KEY HERE") ct_shapefile_with_data <- get_acs( geography = "county", state = "CT", variables = "B25071_001", year = 2019, survey = "acs5", geometry = TRUE # shift_geo is not needed if you're not mapping entire US ) ggplot(data = ct_shapefile_with_data) + geom_sf(aes(fill = estimate), color = NA) + theme_void() + labs(fill='Median Gross Rent as a % of Household Income') + scale_fill_gradient(low="white", high="black") ``` ] See next slide for the rendered map. --- ## Rendering the map: Example 2 <img src="data:image/png;base64,#images/ct_map.png" width="100%" style="display: block; margin: auto;" /> --- ## Impending syntax change for tidycensus: nationwide map .tiny[ __Current syntax:__ ``` r shapefile_with_data <- get_acs( geography = "state", variables = "B25071_001E", year = 2019, survey = "acs5", geometry = TRUE, shift_geo = TRUE ) ``` __Future release syntax:__ ``` r shapefile_with_data <- get_acs( geography = "state", variables = "B25071_001E", year = 2019, survey = "acs5", geometry = TRUE ) |> shift_geometry() ``` ] --- class: center, middle ## Exercise to do on your own time: Follow full template provided in Slide 43 then pick another variable to map for another state. --- ## Other functions from tidycensus `get_estimates()` can give you detailed information about population characteristics. In your own time, try changing the value of `product` to the following: "components", "population", or characteristics". .tiny[ ``` r get_estimates(geography = "state", product = "components", vintage = 2023) ``` ``` ## Using the Vintage 2023 Population Estimates ``` ``` ## # A tibble: 676 × 5 ## GEOID NAME variable year value ## <chr> <chr> <chr> <int> <dbl> ## 1 01 Alabama BIRTHS 2023 58251 ## 2 01 Alabama DEATHS 2023 59813 ## 3 01 Alabama NATURALCHG 2023 -1562 ## 4 01 Alabama INTERNATIONALMIG 2023 5384 ## 5 01 Alabama DOMESTICMIG 2023 30744 ## 6 01 Alabama NETMIG 2023 36128 ## 7 01 Alabama RESIDUAL 2023 -1 ## 8 01 Alabama RBIRTH 2023 11.4 ## 9 01 Alabama RDEATH 2023 11.7 ## 10 01 Alabama RNATURALCHG 2023 -0.307 ## # ℹ 666 more rows ``` ] --- ## Other functions from tidycensus `get_flows()` provides detailed migration flow data (if available). .tiny[ ``` r get_flows( geography = "county", state = "NY", county = "New York", year = 2019 ) ``` ``` ## # A tibble: 2,019 × 7 ## GEOID1 GEOID2 FULL1_NAME FULL2_NAME variable estimate moe ## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> ## 1 36061 <NA> New York County, New York Africa MOVEDIN 468 182 ## 2 36061 <NA> New York County, New York Africa MOVEDOUT NA NA ## 3 36061 <NA> New York County, New York Africa MOVEDNET NA NA ## 4 36061 <NA> New York County, New York Asia MOVEDIN 9911 1039 ## 5 36061 <NA> New York County, New York Asia MOVEDOUT NA NA ## 6 36061 <NA> New York County, New York Asia MOVEDNET NA NA ## 7 36061 <NA> New York County, New York Central Amer… MOVEDIN 1553 857 ## 8 36061 <NA> New York County, New York Central Amer… MOVEDOUT NA NA ## 9 36061 <NA> New York County, New York Central Amer… MOVEDNET NA NA ## 10 36061 <NA> New York County, New York Caribbean MOVEDIN 2783 712 ## # ℹ 2,009 more rows ``` ] --- ## Another mapping tool: mapview package If you love interactive maps, check out the mapview package. It uses the leaflet package but a lot simpler to use! Let us create ct_shapefile_tract first. ``` r ct_shapefile_tract <- get_acs( geography = "tract", state = "CT", variables = "B25071_001", year = 2019, survey = "acs5", geometry = TRUE # shift_geo is not needed if you're not mapping entire US ) ``` --- ## Another mapping tool: mapview package If you love interactive maps, check out the mapview package. It uses the leaflet package but a lot simpler to use! .tiny[ ``` r # install.packages("mapview") library(mapview) mapview(ct_shapefile_tract, zcol = "estimate", layer.name = 'Rent as % of Income') ``` ] See next slide for map output. --- # Mapview output
--- class: center, middle ## Go over what to learn after this semester. --- ## What to learn after this semester: Non-R Materials. - One cool aspect of being a data scientist: you always get to learn new things (e.g., new programming languages, statistical models, new technology stack). - Learn Python on top of R. - Learn SQL. - Get familiar with cloud platforms (e.g., Google Cloud, AWS, Azure). - Get familiar with dashboarding tools (e.g., PowerBI, Tableau). --- ## What to learn after this semester: R Packages. - For maps and geospatial data: sf, tidycensus, leaflet. - For static and interactive dashboards: flexdashboard, shiny, plotly. </br> > My example that uses both: https://jmtfeliciano.github.io/HIVPreventionDashboard </br> - For basic NLP (text mining, sentiment analysis): tidytext - For time-series forecasting (e.g., ARIMA): forecast, arima --- ## What to learn after this semester: R tidyverse Packages. - Glance over the R data science textbook via https://r4ds.hadley.nz/ --- ## What to learn after this semester: Intro to Machine Learning. <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#images/book.png" alt="Figure 9. Image of the 'An Introduction to Statistical Learning' Book, which is free. Free PDF copy in statlearning.com/" width="30%" /> <p class="caption">Figure 9. Image of the 'An Introduction to Statistical Learning' Book, which is free. Free PDF copy in statlearning.com/</p> </div> --- ## What to learn after this semester: Advanced Statistics Topics - Bayesian statistics: rjags, rstan. - Causal Inference: https://www.r-causal.org/ --- ## What to learn after this semester: A future RStudio Alternative Posit, the company that owns RStudio, is currently developing a new alternative to RStudio called Positron. Positron will be optimized for both R and Python use. For more information about Positron, you may visit if you want to try the pre-release (beta version): https://positron.posit.co/ --- class: center, middle ## If you have them, I will answer all your general questions about R, RStudio, internships, or data science as a field. --- ## Thank you for having me! Thank you for attending today's spatial data and mapping workshop. Email: For general questions, please email me at `jfeliciano@american.edu`.